منابع مشابه
Normalising Audio Transcriptions for Unwritten Languages
The task of documenting the world’s languages is a mainstream activity in linguistics which is yet to spill over into computational linguistics. We propose a new task of transcription normalisation as an algorithmic method for speeding up the process of transcribing audio sources, leading to text collections of usable quality. We report on the application of sentence and word alignment algorith...
متن کاملLarge-Scale Text Collection for Unwritten Languages
Existing methods for collecting texts from endangered languages are not creating the quantity of data that is needed for corpus studies and natural language processing tasks. This is because the process of transcribing and translating from audio recordings is too onerous. A more effective method, we argue, is to involve local speakers in the field location, using an audio-only translation inter...
متن کاملText-To-Speech for Languages without an Orthography
Speech synthesis models are typically built from a corpus of speech that has accurate transcriptions. However, many of the languages of the world do not have a standardized writing system. This paper is an initial attempt at building synthetic voices for such languages. It may seem useless to develop a text-to-speech system when there is no text available. But we will discuss some well defined ...
متن کاملorthography of hamza in the persian and arabic languages
from among the arabic alphabet, hamza is the only vowel with four orthographic variations of (alone), (above a w?w), (above and under an ‘alif) and (above a dotless y?). such an orthographic variation for hamza has caused spelling problems for writers mainly those who write in both arabic and persian. the present article is to reach the following objectives: 1. introducing the right orthography...
متن کاملAcquisition of Translation Lexicons for Historically Unwritten Languages via Bridging Loanwords
With the advent of informal electronic communications such as social media, colloquial languages that were historically unwritten are being written for the first time in heavily code-switched environments. We present a method for inducing portions of translation lexicons through the use of expert knowledge in these settings where there are approximately zero resources available other than a lan...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: The Geographical Journal
سال: 1912
ISSN: 0016-7398
DOI: 10.2307/1778241